Discovering structural similarities in narrative texts using event alignment algorithms

نویسنده

  • Nils Reiter
چکیده

This thesis is about the discovery of structural similarities across narrative texts. We will describe a method that is based on event alignments created automatically on automatically preprocessed texts. This opens up a path to large-scale empirical research on structural similarities across texts. Structural similarities are of interest for many areas in the humanities and social sciences. We will focus on folkloristics and research of rituals as application scenarios. Folkloristics researches folktales, i.e., tales that have been passed down orally for a long time. Similarities across different folktales have been observed, both at the level of individual events (being abandoned in the woods) or participants (the gingerbread house) and structurally: Events do not happen at random, but in a certain order. Rituals are an omnipresent part of human behavior and are studied in ethnology, social sciences and history. Similarities across types of rituals have been observed and sparked a discussion about structural principles that govern the combination of individual ritual elements to rituals. As descriptions of rituals feature a lot of uncommon language constructions, we will also discuss methods of domain adaptation in order to adapt existing NLP components to the domain of rituals. We will mainly use supervised methods and employ retraining as a means for adaptation. This presupposes annotating small amounts of domain data. We will be discussing the following linguistic levels: Part of speech, chunking, dependency parsing, word sense disambiguation, semantic role labeling and coreference resolution. On all levels, we have achieved improvements. We will also describe how these annotation levels are brought together in a single, integrated discourse representation that is the basis for further experiments. In order to discover structural similarities, we employ three different alignment algorithms and use them to align semantically similar events. Sequence alignment (Needleman-Wunsch) is a classic algorithm with limited capabilities. A graph-based event alignment system that has been developed for newspaper texts will be used in comparison. As a third algorithm, we employ Bayesian model merging, which induces a hidden Markov model, from which we extract an alignment. We will evaluate the algorithms in two experiments. In the first experiment, we evaluate against a gold standard of aligned descriptions of rituals. Bayesian model merging and predicate alignment achieve the best results, measured using the Blanc metric. Due to difficulties in creating an event alignment gold standard, the second experiment is based on cluster induction. Although this is not a strict evaluation of structural similarities, it gives some insight into the behavior of the algorithms. We induce a document similarity measure from the generated alignments and use this measure to cluster the documents. The clustering is then compared against a

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering and visualizing narrative themes

This paper presents a framework for indexing and browsing databases of stories, in particular characterizing and visually exploring each narrative’s thematic content. We introduce a method for discovering thematic content in texts via lexical dissimilarity statistics. A maximumlikelihood algorithm clusters words into pools of similar meaning, using a thesaurus for rough estimates of word sense ...

متن کامل

بررسی تطبیقی ساختار روایی مصیبت نامه عطار و سیرالعباد سنایی به مثابه "تمثیل رویا"

The Mosibat Nāmah written by ‘Attār Nishabouri (553-627 A.H.) is the narrative of a spiritual journey in the form of a mathnavi (a story in rhyming verse). The device of spiritual journey has been also used by the poet in another work of his, The Manteq al-Tayr (The Conference of Birds). Both works have a similar ending, although narratives are different. Among the narrative elements of The Mo...

متن کامل

Cross-narrative Temporal Ordering of Medical Events

Cross-narrative temporal ordering of medical events is essential to the task of generating a comprehensive timeline over a patient’s history. We address the problem of aligning multiple medical event sequences, corresponding to different clinical narratives, comparing the following approaches: (1) A novel weighted finite state transducer representation of medical event sequences that enables co...

متن کامل

Cross-Document Non-Fiction Narrative Alignment

This paper describes a new method for narrative frame alignment that extends and supplements models reliant on graph theory from the domain of fiction to the domain of nonfiction news articles. Preliminary tests of this method against a corpus of 24 articles related to private security firms operating in Iraq and the Blackwater shooting of 2007 show that prior methods utilizing a graph similari...

متن کامل

Referential Cohesion in Hungarian: A Developmental Study

Discursive functions are shared across all languages, but each language uses different linguistic means to appropriately establish referential cohesion. Children’s mastery of this cohesion in narrative texts develops gradually and is influenced by development in syntax. Consequently, speakers can employ different strategies, and among the various structural configurations of arguments, some are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014